40 research outputs found
Resolving Conflicts for Lower-Bounded Clustering
This paper considers the effect of non-metric distances for lower-bounded clustering, i.e., the problem of computing a partition for a given set of objects with pairwise distance, such that each set has a certain minimum cardinality (as required for anonymisation or balanced facility location problems). We discuss lower-bounded clustering with the objective to minimise the maximum radius or diameter of the clusters. For these problems there exists a 2-approximation but only if the pairwise distance on the objects satisfies the triangle inequality, without this property no polynomial-time constant factor approximation is possible, unless P=NP. We try to resolve or at least soften this effect of non-metric distances by devising particular strategies to deal with violations of the triangle inequality (conflicts). With parameterised algorithmics, we find that if the number of such conflicts is not too large, constant factor approximations can still be computed efficiently.
In particular, we introduce parameterised approximations with respect to not just the number of conflicts but also for the vertex cover number of the conflict graph (graph induced by conflicts). Interestingly, we salvage the approximation ratio of 2 for diameter while for radius it is only possible to show a ratio of 3. For the parameter vertex cover number of the conflict graph this worsening in ratio is shown to be unavoidable, unless FPT=W[2]. We further discuss improvements for diameter by choosing the (induced) P_3-cover number of the conflict graph as parameter and complement these by showing that, unless FPT=W[1], there exists no constant factor parameterised approximation with respect to the parameter split vertex deletion set
Fine-Grained Complexity of Regular Path Queries
A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ evaluation (called PG-approach), i. e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs
Shortest Distances as Enumeration Problem
We investigate the single source shortest distance (SSSD) and all pairs
shortest distance (APSD) problems as enumeration problems (on unweighted and
integer weighted graphs), meaning that the elements -- where
and are vertices with shortest distance -- are produced and
listed one by one without repetition. The performance is measured in the RAM
model of computation with respect to preprocessing time and delay, i.e., the
maximum time that elapses between two consecutive outputs. This point of view
reveals that specific types of output (e.g., excluding the non-reachable pairs
, or excluding the self-distances ) and the order of
enumeration (e.g., sorted by distance, sorted row-wise with respect to the
distance matrix) have a huge impact on the complexity of APSD while they appear
to have no effect on SSSD.
In particular, we show for APSD that enumeration without output restrictions
is possible with delay in the order of the average degree. Excluding
non-reachable pairs, or requesting the output to be sorted by distance,
increases this delay to the order of the maximum degree. Further, for weighted
graphs, a delay in the order of the average degree is also not possible without
preprocessing or considering self-distances as output. In contrast, for SSSD we
find that a delay in the order of the maximum degree without preprocessing is
attainable and unavoidable for any of these requirements.Comment: Updated version adds the study of space complexit
On Counting (Quantum-)Graph Homomorphisms in Finite Fields of Prime Order
We study the problem of counting the number of homomorphisms from an input
graph to a fixed (quantum) graph in any finite field of prime
order . The subproblem with graph was introduced by Faben and
Jerrum~[ToC'15] and its complexity is still uncharacterised despite active
research, e.g. the very recent work of Focke, Goldberg, Roth, and
Zivn\'y~[SODA'21]. Our contribution is threefold. First, we introduce the study
of quantum graphs to the study of modular counting homomorphisms. We show that
the complexity for a quantum graph collapses to the complexity
criteria found at dimension 1: graphs. Second, in order to prove cases of
intractability we establish a further reduction to the study of bipartite
graphs. Lastly, we establish a dichotomy for all bipartite
-free graphs by a thorough structural
study incorporating both local and global arguments. This result subsumes all
results on bipartite graphs known for all prime moduli and extends them
significantly. Even for the subproblem with this establishes new results.Comment: 84 pages, revised title and mainly the Introduction and the section
on partially surjective homomorphism
Fair Correlation Clustering in Forests
The study of algorithmic fairness received growing attention recently. This stems from the awareness that bias in the input data for machine learning systems may result in discriminatory outputs. For clustering tasks, one of the most central notions of fairness is the formalization by Chierichetti, Kumar, Lattanzi, and Vassilvitskii [NeurIPS 2017]. A clustering is said to be fair, if each cluster has the same distribution of manifestations of a sensitive attribute as the whole input set. This is motivated by various applications where the objects to be clustered have sensitive attributes that should not be over- or underrepresented. Most research on this version of fair clustering has focused on centriod-based objectives.
In contrast, we discuss the applicability of this fairness notion to Correlation Clustering. The existing literature on the resulting Fair Correlation Clustering problem either presents approximation algorithms with poor approximation guarantees or severely limits the possible distributions of the sensitive attribute (often only two manifestations with a 1:1 ratio are considered). Our goal is to understand if there is hope for better results in between these two extremes. To this end, we consider restricted graph classes which allow us to characterize the distributions of sensitive attributes for which this form of fairness is tractable from a complexity point of view.
While existing work on Fair Correlation Clustering gives approximation algorithms, we focus on exact solutions and investigate whether there are efficiently solvable instances. The unfair version of Correlation Clustering is trivial on forests, but adding fairness creates a surprisingly rich picture of complexities. We give an overview of the distributions and types of forests where Fair Correlation Clustering turns from tractable to intractable.
As the most surprising insight, we consider the fact that the cause of the hardness of Fair Correlation Clustering is not the strictness of the fairness condition. We lift most of our results to also hold for the relaxed version of the fairness condition. Instead, the source of hardness seems to be the distribution of the sensitive attribute. On the positive side, we identify some reasonable distributions that are indeed tractable. While this tractability is only shown for forests, it may open an avenue to design reasonable approximations for larger graph classes
Building Clusters with Lower-Bounded Sizes
Classical clustering problems search for a partition of objects into a fixed number of clusters. In many scenarios however the number of clusters is not known or necessarily fixed. Further, clusters are sometimes only considered to be of significance if they have a certain size. We discuss clustering into sets of minimum cardinality k without a fixed number of sets and present a general model for these types of problems. This general framework allows the comparison of different measures to assess the quality of a clustering. We specifically consider nine quality-measures and classify the complexity of the resulting problems with respect to k. Further, we derive some polynomial-time solvable cases for k = 2 with connections to matching-type problems which, among other graph problems, then are used to compute approximations for larger values of k